{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 电机异音AI诊断\n",
    "\n",
    "本项目为飞桨领航团AI达人创造营第二期作业。\n",
    "\n",
    "## 项目背景\n",
    "在电机生产线上普遍采用人工听音的方法分辨良、次品，不仅成本高，而且重复、单调的听音工作极易引起人员疲劳，容易出现误判，若个别不良品混入整批成品中，会给工厂带来严重经济损失，甚至严重影响产品声誉。\n",
    "\n",
    "（数据对应的比赛）本次大赛要求参赛者基于加速度传感器采集的振动信号，利用机器学习、深度学习等人工智能技术，设计智能检验的算法，要求算法对故障电机不能有漏识别，在召回100%的情况下，尽量提高预测准确率，以达到替代人工质检的目的。\n",
    "\n",
    "## 数据说明\n",
    "\n",
    "文件清单：\n",
    "\n",
    "1. Motor_tain.zip：用于训练的采集数据，其中，文件夹“正样本”包含30个异常电机的数据样本，文件夹“负样本”包含500个正常电机的数据样本；\n",
    "2. Motor_testP.zip：用于测试的采集数据，包含500个电机的数据样本；  \n",
    "文件说明：采集数据时是分别对电机正转、反转时的振动信号进行采集。也就是说每台电机有两条数据，其中F代表正转，B代表反转。每条数据包含两路振动信号，数据文件命名规则：编号_旋转方向.csv。比赛链接：http://jingsai.julyedu.com/v/25820185621432447/dataset.jhtml\n",
    "\n",
    "## 项目方案\n",
    "\n",
    "每个样本包含F和B两份csv，每个csv文件包含79999×2个数据。使用多个全连接层进行预测。使用的损失函数为交叉熵。\n",
    "\n",
    "模型如下：\n",
    "\n",
    "![](https://ai-studio-static-online.cdn.bcebos.com/d1303434dd3a4d0ba2c23c5146be9fe0648c0389ac2741729aaefd97a3855118)\n",
    "\n",
    "## 处理方案\n",
    "\n",
    "1. 重采样：针对数据不平衡，复制多份正向样本，从而训练时达到类别均衡采样的效果\n",
    "2. 修改损失：针对题目要求的保证TP最大，尽可能减小FP的问题，将label==1的损失权重设定为1.1。\n",
    "\n",
    "# 代码\n",
    "\n",
    "## 预处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false,
    "execution": {
     "iopub.execute_input": "2022-02-15T03:46:41.030000Z",
     "iopub.status.busy": "2022-02-15T03:46:41.029003Z",
     "iopub.status.idle": "2022-02-15T03:47:21.314730Z",
     "shell.execute_reply": "2022-02-15T03:47:21.313682Z",
     "shell.execute_reply.started": "2022-02-15T03:46:41.029937Z"
    },
    "jupyter": {
     "outputs_hidden": false
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "! unzip -oq data/data31822/Motor_tain.zip\n",
    "! unzip -oq data/data31822/Motor_testP.zip"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-02-15T07:02:10.371375Z",
     "iopub.status.busy": "2022-02-15T07:02:10.370471Z",
     "iopub.status.idle": "2022-02-15T07:02:10.382337Z",
     "shell.execute_reply": "2022-02-15T07:02:10.381799Z",
     "shell.execute_reply.started": "2022-02-15T07:02:10.371332Z"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "train_pos_dir='Motor_tain/╒¤╤∙▒╛/'\n",
    "train_neg_dir='Motor_tain/╕║╤∙▒╛/'\n",
    "\n",
    "# 生成三份文件便于按照类别比例划分\n",
    "with open('train_pos.txt','w') as f:\n",
    "    for item in list(set([item[:len(item)-6] for item in os.listdir(train_pos_dir)])):\n",
    "        if '.ipynb' not in item:\n",
    "            f.write(train_pos_dir+item+'\\t1\\n')\n",
    "with open('train_neg.txt','w') as f:\n",
    "    for item in list(set([item[:len(item)-6] for item in os.listdir(train_neg_dir)])):\n",
    "        if '.ipynb' not in item:\n",
    "            f.write(train_neg_dir+item+'\\t0\\n')\n",
    "\n",
    "with open('train.txt','w') as f:\n",
    "    for item in list(set([item[:len(item)-6] for item in os.listdir(train_pos_dir)])):\n",
    "        if '.ipynb' not in item:\n",
    "            for i in range(int(500/30)):\n",
    "                f.write(train_pos_dir+item+'\\t1\\n')\n",
    "    for item in list(set([item[:len(item)-6] for item in os.listdir(train_neg_dir)])):\n",
    "        if '.ipynb' not in item:\n",
    "            f.write(train_neg_dir+item+'\\t0\\n')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 构造读取器"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-02-19T00:17:50.436139Z",
     "iopub.status.busy": "2022-02-19T00:17:50.435457Z",
     "iopub.status.idle": "2022-02-19T00:17:50.444581Z",
     "shell.execute_reply": "2022-02-19T00:17:50.443821Z",
     "shell.execute_reply.started": "2022-02-19T00:17:50.436095Z"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "import paddle\n",
    "import numpy as np\n",
    "import paddle.vision.transforms as T\n",
    "from PIL import Image\n",
    "import pandas as pd\n",
    "\n",
    "class MyDateset(paddle.io.Dataset):\n",
    "    def __init__(self,txt_dir):\n",
    "        super(MyDateset, self).__init__()\n",
    "        \n",
    "        self.path=[]\n",
    "        self.label=[]\n",
    "\n",
    "        with open(txt_dir,'r') as f:\n",
    "            for line in f.readlines():\n",
    "                self.path.append(line.split('\\t')[0])\n",
    "                self.label.append(line.split('\\t')[1][0])\n",
    "\n",
    "    def __getitem__(self, index):\n",
    "        path = self.path[index]\n",
    "        label = int(self.label[index])\n",
    "\n",
    "        F=pd.read_csv(path+'_F.csv')\n",
    "        B=pd.read_csv(path+'_B.csv')\n",
    "\n",
    "        F=paddle.to_tensor(F.values).flatten(0).astype('float32')\n",
    "        B=paddle.to_tensor(B.values).flatten(0).astype('float32')\n",
    "\n",
    "        label = np.array(label).astype('int64')\n",
    "\n",
    "        return F,B,label\n",
    "\n",
    "    def __len__(self):\n",
    "        return len(self.label)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-02-19T00:17:53.402397Z",
     "iopub.status.busy": "2022-02-19T00:17:53.401262Z",
     "iopub.status.idle": "2022-02-19T00:17:54.055551Z",
     "shell.execute_reply": "2022-02-19T00:17:54.054757Z",
     "shell.execute_reply.started": "2022-02-19T00:17:53.402360Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0 [16, 159998] [16, 159998] [16]\n"
     ]
    }
   ],
   "source": [
    "train_dataset=MyDateset('train.txt')\n",
    "\n",
    "train_dataloader = paddle.io.DataLoader(\n",
    "    train_dataset,\n",
    "    batch_size=16,\n",
    "    shuffle=True,\n",
    "    drop_last=False)\n",
    "\n",
    "for step, data in enumerate(train_dataloader):\n",
    "    F, B, label = data\n",
    "    print(step, F.shape, B.shape, label.shape)\n",
    "    break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 构造网络模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-02-19T00:17:56.592846Z",
     "iopub.status.busy": "2022-02-19T00:17:56.592042Z",
     "iopub.status.idle": "2022-02-19T00:17:56.600669Z",
     "shell.execute_reply": "2022-02-19T00:17:56.599434Z",
     "shell.execute_reply.started": "2022-02-19T00:17:56.592809Z"
    },
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "class MyNet(paddle.nn.Layer):\n",
    "    def __init__(self):\n",
    "        super(MyNet,self).__init__()\n",
    "        self.fc_F = paddle.nn.Linear(in_features=159998, out_features=2000)\n",
    "        self.fc_B = paddle.nn.Linear(in_features=159998, out_features=2000)\n",
    "\n",
    "        self.fc_1  = paddle.nn.Linear(in_features=4000, out_features=1000)\n",
    "        self.fc_2  = paddle.nn.Linear(in_features=1000, out_features=200)\n",
    "        self.fc_3  = paddle.nn.Linear(in_features=200, out_features=2)\n",
    "\n",
    "    def forward(self,F,B):\n",
    "        F = self.fc_F(F)\n",
    "        B = self.fc_B(B)\n",
    "        x = paddle.concat([F,B],axis=-1)\n",
    "        x = self.fc_1(x)\n",
    "        # x = paddle.nn.functional.relu(x)\n",
    "        x = self.fc_2(x)\n",
    "        # x = paddle.nn.functional.relu(x)\n",
    "        x = self.fc_3(x)\n",
    "        return x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 训练"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-02-19T00:21:07.099532Z",
     "iopub.status.busy": "2022-02-19T00:21:07.098411Z",
     "iopub.status.idle": "2022-02-19T00:27:25.354693Z",
     "shell.execute_reply": "2022-02-19T00:27:25.353939Z",
     "shell.execute_reply.started": "2022-02-19T00:21:07.099490Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "epoch: 1, batch: 37, loss is: [0.631444]\n",
      "epoch: 3, batch: 13, loss is: [0.63104683]\n",
      "epoch: 4, batch: 51, loss is: [0.63319224]\n",
      "epoch: 6, batch: 27, loss is: [0.5660453]\n",
      "epoch: 8, batch: 3, loss is: [0.5968682]\n",
      "epoch: 9, batch: 41, loss is: [0.5927209]\n"
     ]
    }
   ],
   "source": [
    "model = MyNet()\n",
    "model.train()\n",
    "\n",
    "max_epoch=10\n",
    "opt = paddle.optimizer.SGD(learning_rate=0.001, parameters=model.parameters())\n",
    "\n",
    "now_step=0\n",
    "for epoch in range(max_epoch):\n",
    "    for step, data in enumerate(train_dataloader):\n",
    "        now_step+=1\n",
    "\n",
    "        F, B, label = data\n",
    "        pre = model(F,B)\n",
    "        loss = paddle.nn.functional.cross_entropy(pre,label,weight=paddle.to_tensor([1,1.1]),reduction='mean')\n",
    "        loss.backward()\n",
    "        opt.step()\n",
    "        opt.clear_gradients()\n",
    "        if now_step%100==0:\n",
    "            print(\"epoch: {}, batch: {}, loss is: {}\".format(epoch, step, loss.mean().numpy()))\n",
    "\n",
    "paddle.save(model.state_dict(), 'model.pdparams')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 查看混淆矩阵\n",
    "\n",
    "格式为\n",
    "\n",
    "\n",
    "|  | 预测为0 | 预测为1 |\n",
    "| -------- | -------- | -------- |\n",
    "| 实际为0     | value    | value     |\n",
    "| 实际为1     | value     | value     |\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2022-02-19T00:28:42.509916Z",
     "iopub.status.busy": "2022-02-19T00:28:42.509244Z",
     "iopub.status.idle": "2022-02-19T00:29:23.948685Z",
     "shell.execute_reply": "2022-02-19T00:29:23.948081Z",
     "shell.execute_reply.started": "2022-02-19T00:28:42.509867Z"
    },
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[455.,  45.],\n",
       "       [  0., 480.]])"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# mydict = paddle.load(\"model_1.pdparams\")\n",
    "# model.set_state_dict(mydict)\n",
    "\n",
    "record=np.zeros([2,2])\n",
    "for i in range(len(train_dataset)):\n",
    "    F,B,label=train_dataset[i]\n",
    "    pre=model(F,B)\n",
    "    # print(f'real label: {label} pre label: {np.argmax(pre.numpy())}')\n",
    "    record[label.tolist()][np.argmax(pre.numpy())]+=1\n",
    "record"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 总结\n",
    "\n",
    "本项目使用了极简方式构造了一个电机异音AI诊断模型，最终结果表明模型可以将全部正样本（异常）都判断正确，并且负样本的占比为9%。除去模型效果的因素外，本项目有以下不足，可以继续改进：\n",
    "\n",
    "1. 本项目构造了一个简单的全连接层网络模型，但由于数据量较大，需要在第一个全连接层尽可能地压缩节点数。可以尝试使用卷积替换全连接层。\n",
    "3. 由于样本量较少（正向样本仅30个），仅设置了训练集，没有设定验证集。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 个人介绍\n",
    "\n",
    "> 作者：笠雨聆月\n",
    "\n",
    "> 兴趣：目前从最容易上手的cv进行学习，也在尝试nlp，gan等等，各种方向来者不拒\n",
    "\n",
    "> 个人主页：https://aistudio.baidu.com/aistudio/personalcenter/thirdview/608082"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "请点击[此处](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576)查看本环境基本用法.  <br>\n",
    "Please click [here ](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576) for more detailed instructions. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "py35-paddle1.2.0"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
